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components of 
matrix [M] 
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steps in multi- 
step one-derivative 
scheme (see 
eq. (Al)) 
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integer which de- 
pends on number 
of symmetry opera- 
tions used 
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partitions of 
vector {N} 
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vector of external 
forces 

Vo 

intensity of normal 
loading 
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right-hand side 
vector (see eqs. ( 4 )) 

{Qv} 

vector defined in 
equation (All) 

{Qh,{Q}2,{Q} 3 

partitions of 
vector {Q} 

{QMQMO}, 

components of 
vector {Q} (see 
eqs. ( 7 ) and (8)) 

{R},{k},{R},{k} 

symmetric /antisym- 
metric components 
of residual vectors 
used in PCG 
technique 
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radius of curvature 
of cylindrical panel 
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strain-displacement 
matrix (see eqs. (2)) 

S'? 
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obtained by using 
proposed strategy 
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matrix [S] 
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[5], [5] 

components of 
matrix [ S } 

\ThWvUTx) 

transformation 
matrices (see 
eqs. (B15) to (B19)) 

[ Th], [ T h ], [T*], 

[Tx),[f Q ),[t Q ] 

matrices defined in 
equations (Bll) to 
(B14) 

[TV], [TV] 

transformation ma- 
trices (see eqs. (6)) 

t 

time 

U c 

total complementary 
strain energy of 
structure 

u l} u 2 ,w 

displacement com- 
ponents of mid- 
dle surface of 
shell in coordinate 
directions 

{V} 

vector of nodal 
velocities 

Wl.W 2 .W 3 

partitions of 
vector {V} 

W 

vector of nodal 
displacements 

Wi.W2.W3 

partitions of 
vector {X} 

zi.Z2.Z3 

orthogonal coordi- 
nate system 

w.w.W.W 

symmetric/antisym- 
metric components 
of residual vec- 
tors used in PCG 
technique 

Cz},{~zuh{h 

symmetric/antisym- 
metric components 
of conjugate search 
direction vectors 

a i . Pi 

coefficients in 
multistep operators 
(see table AI) 

& 3 

step length for j th 
correction vector of 
PCG technique 


Pj+i 

orthogonalization 
coefficient used 
in j th iteration of 
PCG technique 

[r.],[Ta] 

transformation 

matrices 

At 

time step used in 
temporal integration 
scheme 

6 

prescribed tolerance 
in PCG technique 

Ao. Ai, A 

tracing parameters 
identifying correc- 
tion terms to sym- 
metric response 
vectors 

V LT 

major Poisson’s 
ratio of individual 
layers of panel 


rotation components 
of middle surface of 
panel 

w 

vector of stress 
parameters and 
nodal velocities 

$}, {^}, w, w 

symmetric/antisymmetric 
components of vec- 
tor {t/>} 

n.n 

original and effective 
element domains 
(see fig. 1) 

Subscripts: 

a 

antisymmetric 

component 

i 

time 

j 

PCG iteration cycle 

max 

maximum value 

P,Q 

range from 1 to 
total number of 
nodal velocities in 
model 

P 

ranges from 1 to 
total number of 
stress- resultant 
parameters in model 
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symmetric 

component 

Superscripts: 

( r ) 

N e w t on-Raphson 
iteration cycle 

T 

transposition 

Abbreviations: 


CP 

central processor 

CPU 

central processing 
unit 


ECL 

emitter coupled 
logic 

MFLOPS 

millions of floating 
point operations per 
second 

PCG 

preconditioned 
conjugate gradient 


A dot over a symbol indicates a derivative with 
respect to time; a bar (”) over a vector refers to 
symmetric components; a tilde (~) over a vector refers 
to antisymmetric components. 
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Abstract 

A computational procedure is presented for the 
nonlinear dynamic analysis of unsymmetric struc- 
tures on vector multiprocessor systems. The pro- 
cedure is based on a novel hierarchical partitioning 
strategy in which the response of the unsymmetric 
structure at any time instant is approximated by a 
linear combination of symmetric and antisymmetric 
response vectors, each obtained by using only a frac- 
tion of the degrees of freedom of the original finite ele- 
ment model. The three key elements of the procedure 
which result in a high degree of concurrency through- 
out the solution process are (1) mixed (or primitive 
variable) formulation with independent shape func- 
tions for the different fields, (2) operator splitting or 
restructuring of the discrete equations at each time 
step to delineate the symmetric and antisymmetric 
vectors constituting the response, and (3) two-level 
iterative process for generating the response of the 
structure. An assessment is made of the effectiveness 
of the procedure on the CRAY X-MP/4 computers. 

1. Introduction 

Dynamic response calculations for future aero- 
space systems are expected to require processing 
rates far in excess of the upper limit of computers 
built around a single processing unit. This is because 
of the complexity of these systems and the high de- 
gree of sophistication of the computational models re- 
quired for simulating their response (see, for example, 
ref. 1). The introduction of novel forms of machine 
architecture has brightened the prospects for meet- 
ing future large-scale computational needs. Most of 
the new machines achieve high performance through 
vectorization and/or parallelism. These include top- 
of-the-range supercomputers such as CRAY X-MP, 
CRAY 2, and ETA- 10, as well as computers based 
on readily available and inexpensive basic units such 
as the hypercubes and the Connection Machine. The 
characteristics of several new machines are summa- 
rized in references 2 to 5. 

Much work has been devoted to the develop- 
ment of vector and parallel numerical algorithms for 
performing matrix operations, solution of algebraic 
equations, and extraction of eigenvalues. (See, for ex- 
ample, refs. 6 to 10.) Also, a number of special strate- 
gies have been proposed for increasing the degree 
of parallelism and/or vectorization in finite element 
computations. These strategies include domain de- 
composition (or substructuring), operator splitting, 
and element-by-element techniques. They are re- 
lated to the “divide and conquer” paradigm based on 
breaking a large (complex) problem into a number of 
smaller (simpler) subproblems which may be solved 


independently on distinct processors. The degree 
of independence of the subproblems is a measure of 
the effectiveness of the algorithm since it determines 
the amount and frequency of communication and 
synchronization . 

Substructuring techniques when combined with 
operator splitting and iterative procedures result in 
computational strategies which are well-suited for 
parallel vectorization. (See refs. 11 and 12.) How- 
ever, for the strategy to be effective the following 
two conditions must be satisfied: 

1. The partitioning of a discretized structure into 
substructures must achieve well-balanced workload 
distribution among the different processors. 

2. The iterative process used to account for the 
coupling between the different substructures must 
be rapidly converging. This, in turn, requires the 
substructures not to be strongly coupled. 

The two conditions are difficult to satisfy, and 
to date, despite a number of attempts to develop 
partitioning strategies (see, for example, refs. 13 
and 14), no general strategy exists which satisfies 
the aforementioned two conditions. The present 
study focuses on this problem. Specifically, the 
objective of this paper is to present an effective 
hierarchical partitioning strategy for use in large- 
scale nonlinear finite element dynamic analysis on 
multiprocessor computers. To sharpen the focus of 
the study, multiprocessor computers with a small 
number of powerful processors such as the CRAY 
X-MP/4 are considered. 

The authors had useful discussions with Huey 
Carden of NASA Langley Research Center, Ahmed 
Sameh of the University of Illinois at Urbana- 
Champaign, and Chris Hsiung of CRAY Research, 
Inc., Chippewa Falls, Wisconsin. Also, Phuong 
H. Vu and Richard Steimann of CRAY Research, 
Inc., Mendota Heights, Minnesota, helped in the 
computer 
implementation. 

2. Mathematical Formulation 

The analytical formulation is based on a 
moderate-rotation geometrically nonlinear theory of 
the structure. (See ref. 15.) All the dissipative forces 
are neglected. A mixed formulation is used in which 
the fundamental unknowns consist of the internal 
forces (or stress resultants), the generalized displace- 
ments, and the velocity components of the structure. 
A total Lagrangian description of the deformation of 
the structure is used, in which the configurations of 
the structure at different times are referred to the ini- 
tial coordinate system of the undeformed structure. 

The spatial discretization is performed by divid- 
ing the structure into finite elements. The degree of 



the polynomial shape functions used for approximat- 
ing the generalized displacements and velocity com- 
ponents differs from the degree of the polynomials 
used in approximating the stress resultants. More- 
over, continuity of stress resultants is not imposed 
at interelement boundaries. The semidiscrete sys- 
tem of equations governing the dynamic response of 
the structure consists of the following three sets of 
relations: 

Constitutive relations: 

[F]{H} = [S}{X} + {G(X)} (1) 

Equations of motion: 

(M]{n = {P} - [S] r {tf } - {N(H,X)} (2) 

Relations between velocity and displacement 
components: 

{X} = {V} (3) 

where {X} and {V} are the vectors of unknown nodal 
displacements and velocities, respectively; {H} is the 
vector of stress-resultant parameters; [JF] is a block 
diagonal matrix of element flexibilities (with each di- 
agonal block associated with the flexibility matrix of 
a single element); [5] is the linear strain-displacement 
matrix; {P} is the vector of external forces; [ M ] is 
a banded matrix of the consistent mass coefficients; 
{G(X)} and {N(H,X)} are the vectors of non- 
linear terms; superscript T denotes transposition; 
and a dot over the symbol indicates a derivative 
with respect to time. In the moderate-rotation the- 
ory used herein, the vector (G(X)} is quadratic in 
{X}; and the vector {N(H,X)} is bilinear in {H} 
and {*}. 

The temporal integration is performed by using 
an implicit, multistep, one-derivative scheme. The 
resulting nonlinear simultaneous algebraic equations 
at each time step are solved by using a Newton- 
Raphson iterative technique. The iterative process 
is represented by the following equations for the 
stress parameters and velocity components of the rth 
iteration cycle (see appendix A): 

[Kp{ AV>}S r) = {Q>i r) (4) 

WS r+1) = WS r) + {A^ r) (5) 

where is a global structure matrix which in- 

cludes the flexibility [F], linear strain displacement 
[5], mass [M], and nonlinear contributions {G(A')} 
and {iV(iJ,X)} from different elements (see appen- 
dix A); {Q}^ is the global right-hand side vector; 


and {^}^ r+ = {y}/ are the stress parameters 

and velocity components at the end of the rth iter- 
ation cycle at time z. The procedure for evaluating 

the corresponding nodal displacements, {X}^ r+1 \ is 
outlined in appendix A. 

3. Basic Idea and Key Elements of the 
Proposed Procedure 

The proposed procedure is based on approximat- 
ing the response of the unsymmetric structure, at 
any time instant, by a linear combination of symmet- 
ric and antisymmetric response vectors (or modes). 
Each of these modes is generated with the use of 
a reduced-size model of the original structure. The 
size of the reduced model depends on the number of 
symmetry operations used. 

The model-size reduction process is depicted in 
figures 1 and 2. It can be thought of as a hierarchi- 
cal partitioning strategy, in which the original struc- 
ture is replaced by an equivalent symmetrized struc- 
ture, having the same symmetric/antisymmetric 
components of the response. 

The top left sketch in figure 1 shows the origi- 
nal unsymmetric finite element model. The response 
vector { V } is decomposed into symmetric and anti- 
symmetric vectors {V} s and {V} a with respect to 
line ab. (See fig. 2.) Each pair of complementary 
elements (with respect to ab) is replaced by a sin- 
gle effective element in the symmetrized structure. 
Since each of the symmetric and antisymmetric com- 
ponents of the response vector is generated by using 
only one half the symmetrized structure, the decom- 
position process amounts to partitioning the original 
structure into two substructures. 

The process is repeated to effect further reduc- 
tion in the size of the substructures by^ decom- 
posing each of the vectors {V*} s and {V} a into 
symmetric/antisymmetric subvectors with respect to 
line cd in figure 1. Each of the resulting four vec- 
tors {^}, {V }, {V^ and {V} can be generated by us- 
ing only one quadrant of the symmetrized structure 
(shown in the bottom right sketch of fig. 1). Hence- 
forth, a bar (“) and a tilde (~) over a vector refer to 
the symmetric and antisymmetric components of the 
vector, respectively. 

If the fundamental unknowns are selected to be 
the symmetric/antisymmetric components of the re- 
sponse vector, the governing equations in these un- 
knowns can be obtained by using simple matrix 
transformations on pairs of complementary elements 
(see appendix B). For symmetric structures, the gov- 
erning equations in the symmetric/antisymmetric 
components of the response vector are uncoupled. 
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On the other hand, for unsymmetric structures, the 
equations are coupled. In the proposed procedure, 
the uncoupled equations are used as a preconditioner, 
and the coupling is accounted for by using an itera- 
tive process. 

The three key elements of the proposed procedure 
are (1) mixed (primitive variable) formulation with 
independent shape functions for the stress resultants, 
generalized displacements, and velocity components, 
and with the stress resultants allowed to be discontin- 
uous at interelement boundaries; (2) operator split- 
ting or restructuring of the discrete equations of the 
structure to delineate the contributions to the sym- 
metric and antisymmetric vectors constituting the 
response; and (3) two-level iterative process (with 
nested iteration loops) to generate the symmetric and 
antisymmetric components of the response vectors at 
each time step. The top- and bottom-level iterations 
(outer and inner iteration loops) are performed by us- 
ing Newton-Raphson and preconditioned conjugate 
gradient (PCG) techniques, respectively. 

4. Computational Procedure 

A flowchart of the computational procedure is 
given in table I. The application of the proposed 
procedure to structural dynamics problems can be 
conveniently divided into two phases, each of which 
is well-suited for parallel processing: 

1. Preprocessing phase in which the finite element 
model of the entire structure is used. The major steps 
in this phase are hierarchical partitioning of the re- 
sponse vectors, introducing the various transforma- 
tion matrices (see appendix B and ref. 16), appli- 
cation of operator splitting, and generation of the 
various arrays for the reduced-size model through 
successive use of pairs of complementary elements 
from the model. (See fig. 1.) The number of de- 
grees of freedom in the reduced-size model is approx- 
imately 1/m that of the original finite element model, 
where m is an integer which depends on the num- 
ber of symmetry operations used. For the sake of 
load balancing, it is desirable to select the symme- 
tries such that m equals the number of processors 
(i.e., 4 for the CRAY X-MP/4). 

2. Solution phase is which only the reduced-size 
model is used in generating the transient response 
of the structure. The fundamental unknowns in 
this phase are the velocity components, stress resul- 
tants, and generalized displacements of one of the 
substructures. 

A two-level iterative process (with two nested 
iteration loops) is used to generate the symmet- 
ric/antisymmetric components of the response vec- 
tors at each time step. The top-level iteration (outer 


iteration loop) is the Newton-Raphson iteration, and 
the bottom-level iteration (inner iteration loop) is the 
PCG iteration. The details of the PCG iterations are 
given in appendix C. 

The following comments regarding the procedure 
seem to be in order: 

1. The decomposition of the response vectors into 
symmetric and antisymmetric components can be 
performed by means of matrix transformations. For 
example, the symmetric and antisymmetric compo- 
nents of the velocity vector can be expressed as 
follows (see fig. 2): 


{V}s = [T V }l{V} {V} a = [T v } 1 {V} 
{V}s,s = [fv}2{V}s {V}s,a = [fvh{V}s > 
{V} a , a = [f V ] 2 {V}a {V} a , a = [T V } 2 {V} a j 


(6) 


The symmetry transformations and the explicit form 
of the transformation matrices [Ty] and [Ty] are 
given in appendix B. 

2. The aforementioned transformations are used 
in conjunction with the operator splitting tech- 
nique to delineate the different symmetric and anti- 
symmetric components of the response. For the case 
of one symmetric operation (m = 2), equations (4) 
can be cast as 



+ Xq[K] 


{A^} = {Q} + A 0 {g} 


( 7 ) 


where the subscript i and the superscript (r) are 
dropped for convenience. Similarly, for the case of 
two symmetry operations (m = 4), equations (4) can 
be cast as 


[R} + X 1 [R} + Xo{[K] + Xi[k]^ {AV-} 

= {0} + -^i{0} + ({g} + -^i{g}) (8) 


Tracing parameters Aq and Ai are introduced 
in equations (7) and (8) to identify the correction 
terms to the symmetric response vectors. The case 
Ao = 0 in equations (7) corresponds to a symmet- 
ric response with respect to ab (see fig. 1), and the 
case Aq = Ai = 0 in equations (8) corresponds to a 
symmetric response with respect to both cb and cd. 
For Ao = Ai — 1, equations (7) and (8) are equiva- 
lent to equations (4). Therefore, Aq and Ai are trac- 
ing parameters which identify all the coupling terms 
to the symmetric/antisymmetric components of the 
response vectors. The explicit forms of the arrays 
[K] t [K] y {Q}, and {Q} in equations (7) are given in 
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appendix B (eqs. (B22) to (B25)). The vector {At/>} 
refers to the increments of the symmetric and the 
antisymmetric components of the response vector. 
As shown in appendix B, the equations associated 
with each of the symmetric and antisymmetric com- 
ponents of the response vector are the same, except 
for the terms associated with the unknowns at the 
interfaces. Therefore, only one set of these equations 
needs to be considered. The sizes of the arrays in 
equations (7) are approximately 1/2 in the original 
equations (eqs. (4)). Similarly, the sizes of the arrays 
in equations (8) are approximately 1/4 the size of the 
corresponding ones in equations (4). 

3. Equations (8) (or (7)) are solved by using the 
preconditioned conjugate gradient (PCG) technique. 
The matrix [K] (or [if]) is selected as the precon- 
ditioning matrix. For the case m = 4, the precon- 
ditioning matrix needs to be decomposed four times 
corresponding to the four different combinations of 
symmetry and antisymmetry conditions along cb and 
cd (see fig. 1). The four decompositions are done con- 
currently on different processors (whenever possible) . 

4. The preconditioned residuals in the PCG tech- 
nique have regular, predetermined patterns of sym- 
metry and antisymmetry with respect to cb and cd. 
These patterns are exploited in the solution pro- 
cess, in reducing the amount of computations, as 
well as in balancing the computational load between 
processors. 

For the casejn == 2, the procedure for generating 
the matrices [K], [K], {Q}, and {Q} is outlined in 
appendix B and reference 16. The matrices for 
the case m =4 are obtained by repeating the same 
process as that used for m = 2. 

5. Implementation on CRAY X-MP 
Computers 

The proposed computational procedure based on 
hierarchical partitioning has been implemented on 
the CRAY X-MP/4 computer system at Cray Re- 
search, Inc., Mendota Heights, Minnesota. The ma- 
jor characteristics of the computer system are sum- 
marized in appendix D. The program was coded in 
FORTRAN using Cray FORTRAN (CFT) compiler 
directives for vectorization. 

To improve the performance an attempt was 
made to maximize the use of the chaining facility 
of the CRAY computer (overlapping vector multiply, 
add, and memory access) whenever possible (steps 5, 

6, 7, 8, and 11 of table I). This was accomplished 
by calling the SCILIB routines for performing the 


following six matrix/vector operations: 


SMXPY : 

{Y} + [M}{X} 

SXMPY : 

{Y} + [M] t {X} 

MXV : 

[M]{X} 

MXVA : 

m T {x) 

MXM : 

[M][A] 

MXMA : 

[M) t [A] 


where [M], [A] are matrices; {AT},{Y} are vectors; 
and superscript T denotes transposition. 

The decomposition and back substitution in 
steps 10 and 11 (table I) were performed by using 
optimized versions of the LINPACK routines SPBFA 
and SPBSL which make extensive use of the chaining 
facility of the CRAY computer and have their inner- 
most loops coded in assembly language. (See refs. 17 
and 18.) 

Multiprocessing was achieved by using the micro- 
tasking facility of the CRAY X-MP computer. This 
was accomplished by placing special directives in the 
code, then passing the code through a utility pre- 
processor called PREMULT to interpret these di- 
rectives, and inserting the appropriate microt asking 
code. The output from PREMULT is then passed 
through the CFT compiler. The microtasking di- 
rectives select the maximum number of CPU’s to 
be used, identify the routines and DO loops to be 
microtasked, and identify the synchronization points 
in the code. The details of these directives are given 
in references 19, 20, and 21. In the proposed pro- 
cedure, microtasking was used in steps 2, 5, 6, 7, 
8, 10, and 11 of table I. Synchronization was used 
in steps 11 and 12. To increase the speedup gain 
from multiprocessing, microtasking was done on the 
outerloops of each of the aforementioned steps in 
table I, and therefore, the full arrays (elemental 
and structure arrays) were processed by each CPU, 
rather than dividing each of these arrays between the 
different CPU’s. 

6. Numerical Studies and Performance 
Evaluation of Proposed Procedure 

To assess the effectiveness of the proposed com- 
putational procedure and partitioning strategy, a 
number of dynamic problems of unsymmetric struc- 
tures have been solved by using this procedure on 
two CRAY X-MP/4 computer systems at CRAY Re- 
search, Inc., Mendota Heights, Minnesota — namely, 
an X-MP/48 and an X-MP/416 (see appendix D and 
table II for a summary of the characteristics of these 
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machines). For each problem, the accuracy and com- 
putational time required by the proposed procedure 
were compared with those of the direct analysis of the 
(full) structure. Herein, the results are presented for 
a typical problem of a laminated anisotropic cylin- 
drical panel with an off-center circular cutout. The 
loading is assumed to be uniformly distributed and 
normal to the panel surface and to have a step vari- 
ation in time. The panel is made of graphite-epoxy 
material. The material and geometric characteristics 
of the panel are given in figure 3. 

A total of 192 mixed elements were used for the 
spatial discretization of the panel. Biquadratic shape 
functions were used for approximating each of the 
generalized displacements, and bilinear shape func- 
tions were used for approximating each of the stress 
resultants (a total of 6144 stress-resultant parame- 
ters, 3818 nonzero displacement degrees of freedom, 
and 3818 nonzero velocity components). The element 
characteristics are given in reference 22. 

6.1 Time History Response of the Structure 

The panel was divided into four substructures and 
the proposed procedure was applied to reduce the 
size of the analysis model to that of the correspond- 
ing doubly symmetric structure. Reflection symme- 
try (and antisymmetry) was assumed with respect 
to each of the interfaces between the different sub- 
structures. These symmetries (and antisymmetries) 
are typically exhibited by the response of orthotropic 
panels with a symmetric geometry and symmetric 
boundary conditions when subjected to symmetric 
(and antisymmetric) loading. 

The temporal integration was performed with the 
implicit Park three-step method. Galerkin’s method 
(Crank-Nicholson scheme) and the Gear two-step 
method were used to start the computation (for the 
first and second time steps, respectively). For the 
linear case, the critical time step for numerical sta- 
bility is At cr = 0.1194 /isec. The time step selected 
in the present study was 50 /isec. Both the small 
displacement (linear) and the large displacement (ge- 
ometrically nonlinear) elastic responses of the panel 
were generated for a duration of 3 msec (60 time 
steps). An energy balance check (ref. 23) was per- 
formed at each time step to ensure the accuracy of 
the solution. At each time step, 3 Newton-Raphson 
iterations and an average of 20 PCG iterations (per 
Newton-Raphson iteration) were required to obtain 5 
significant digits of accuracy of the response quanti- 
ties. Solutions obtained by using the proposed strat- 
egy were compared with the corresponding solutions 
obtained by analyzing the full panel (using the same 
temporal integration scheme, same time step, and 


same number of Newton-Raphson iterations at each 
time step). 

The time histories of the normal displacement 
component u> a and the normal velocity component 
w a at point a, the total complementary strain en- 
ergy f7 c , and the total kinetic energy K are shown 

in figure 4. Point a corresponds to a midside node 
which exhibited the maximum absolute value for the 
normal displacement component during the transient 
analysis. Normalized contour plots for the general- 
ized displacements and velocity components at t = 
3.0 msec of both the linear and nonlinear responses 
are shown in figure 5. Each generalized displacement 
and velocity component is normalized by its maxi- 
mum absolute value as given in table III. 

6.2 Efficiency Gain Obtained by Using the 
Proposed Procedure 

To assess the efficiency gain resulting from the use 
of the proposed procedure and partitioning strategy, 
the measured CP times, wall-clock times, and pro- 
cessing rates obtained by using the present compu- 
tational procedure were compared with those of the 
direct analysis of the panel (no partitioning). The 
measured analysis times and processing rates for the 
first 10 time steps are given in table IV and in fig- 
ures 6 to 8. The CP times and processing rates 
on a single CPU were obtained by using the FLOP 
TRACE (on the X-MP/48) and PERF TRACE (on 
the X-MP/416) utilities of the CRAY computer. (See 
ref. 24 and appendix D.) The wall-clock times for two 
and four CPU’s were recorded. 

The application of the partitioning strategy out- 
lined in section 3 resulted in reducing the number of 
nonzero displacement and velocity degrees of freedom 
from 3818 to 971 and the stress-resultant parameters 
from 6144 to 1536. The semibandwidth of the global 
matrices was reduced from 700 to 315. 

When a single CPU was used, the measured wall- 
clock time for the direct analysis was 319.7 sec on 
the X-MP/416. The corresponding wall-clock time 
for the proposed procedure was 122.6 sec. Because 
of extensive use of vectorization and chaining in the 
implementation, the sustained processing rate for the 
direct analysis was 160 MFLOPS (Millions of FLoat- 
ing point Operations Per Second). The correspond- 
ing sustained rate for the proposed procedure was 
only 119.6 MFLOPS. 

The measured CP times and speed or processing 
rates on a single CPU for the different modules of the 
proposed strategy are shown in figures 6 and 7. As 
expected, the majority of the time, 91.3 sec (75 per- 
cent of the total time), is expended in the decompo- 
sition of the four preconditioning matrices [K] with 
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different symmetry /antisymmetry conditions at each 
Newton-Raphson iteration of each time step. The to- 
tal time expended in the PCG iterations was 11 sec 
(9 percent of the total time). As can be seen from 
figure 7, the processing rates of most of the mod- 
ules was over 100 MFLOPS. The analysis times for 
the proposed procedure on two and four CPU’s were 
66.7 and 36.3 sec, respectively, on the X-MP/416. 
The speedups gained by multiprocessing are 1.84 and 
3.38 on two and four CPU’s, respectively. 

The total speedup St achieved by the proposed 
procedure is defined as the ratio of the wall-clock 
time for the direct analysis of the full structure on 
a single CPU divided by the wall-clock time for the 
partitioned structure on multiple CPU’s. The values 
of St on the X-MP/416 were 2.61, 4.79, and 8.81 for 
one, two, and four CPU’s, respectively. 

The corresponding analysis times, speedups, and 
values of St on the X-MP/48 are given in figure 8 
and table IV. The reduction in wall-clock times and 
the increase in speedups on the X-MP/416 (as com- 
pared with those on the X-MP/48) are due primar- 
ily to the increased number of interleaved memory 
banks, which results in fewer memory conflicts. It 
is anticipated that the speedup ratios can be in- 
creased by further optimizing the FORTRAN code 
used in implementing the computational procedure 
of the partitioning strategy. The sustained process- 
ing rate would then be increased to that of the direct 
analysis implementation. 

7. Comments on Proposed Computational 
Procedure 

The numerical results presented in the preced- 
ing section clearly demonstrate the effectiveness of 
the proposed partitioning strategy for predicting the 
nonlinear dynamic response of unsymmetric struc- 
tures on multiprocessor computers. In particular, the 
following comments can be made regarding the key 
elements and the procedure and its key elements: 

1. Even though the mixed finite element mod- 
els used herein have been shown to be equivalent 
to reduced integration displacement models (ref. 22), 
the use of mixed (primitive variable) formulation was 
found to be essential for solving highly unsymmetric 
problems, with strong coupling between the symmet- 
ric and antisymmetric components of the response. 
The internal forces (or stress resultants) should only 
be eliminated after , not before , the application of op- 
erator splitting . 

2. The application of operator splitting in con- 
junction with symmetry transformations allows the 
restructuring of the discrete equations at each time 
step to delineate the symmetric and anti- 


symmetric vectors constituting the response, as well 
as the coupling terms between these vectors. 

3. The preconditioned conjugate gradient (PCG) 
was found to be a highly efficient and stable tech- 
nique for handling highly unsymmetric problems. 

4. The proposed computational procedure can 
be thought of as generating the dynamic response 
of an unsymmetric structure, using small or large 
perturbations from the response of a simpler structure 
in which the symmetric/antisymmetric components 
of the response vector, at any time, are uncoupled. 

5. The proposed procedure can also be used for 
the efficient dynamic analysis of unsymmetric struc- 
tures on single-processor computers. This is because 
the uncoupled equations, for the symmetric and 
antisymmetric response quantities, are the same ex- 
cept for the terms associated with the interface un- 
knowns. (See, for example, eqs. (B26) to (B31).) The 
efficient generation of the symmetric and antisym- 
metric response quantities (eqs. (C3), (C6) to (C8), 
and (C28) to (C31)) can therefore be accomplished 

by 

(a) Single partial decomposition of one set of 
equations (e.g., the equations associated with {Y} 
which do not include the symmetry/ antisymmetry 
conditions along the interfaces) 

(b) Incorporation of the symmetry /anti- 
symmetry conditions, completing the decomposition, 
evaluating the right-hand sides, and successively gen- 
erating the symmetric and antisymmetric response 
vectors 

6. As the number of symmetry transformations 
increases, the coupling between the symmetric/ 
antisymmetric components of the response vector in- 
creases and, therefore, the number of iterations in 
the PCG technique increases. Consequently, the 
proposed procedure may not be effective on multi- 
processor computers with fine granularity and small 
local memories (e.g., the hypercubes and the 
Connection Machine of Thinking Machine, Inc.). 

8. Concluding Remarks 

A computational procedure is presented for non- 
linear dynamic analysis on vector multiprocessor 
computers. The procedure is based on a novel hier- 
archical partitioning strategy in which the response 
of an unsymmetric structure at any time instant is 
approximated by a linear combination of symmet- 
ric and antisymmetric response vectors (or modes), 
each obtained by using only a fraction of the degrees 
of freedom of the original finite element model. 

The analytical formulation is based on a 
moderate-rotation geometrically nonlinear theory of 
the structure. A mixed formulation is used in 
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which the fundamental unknowns consist of the 
internal forces (or stress resultants), the generalized 
displacements, and the velocity components of the 
structure. The spatial discretization is performed by 
using mixed finite element models with the stress re- 
sultants allowed to be discontinuous at interelement 
boundaries. An implicit, multistep, one-derivative 
scheme is used for the temporal integration. 

The three key elements of the procedure which 
result in a high degree of concurrency throughout 
the solution process are (1) mixed (or primitive vari- 
able) formulation with independent shape functions 
for the different fields, (2) operator splitting or re- 
structuring of the governing discrete equations at 
each time step to delineate the symmetric and anti- 
symmetric vectors constituting the response, and 
(3) two-level iterative process for generating the re- 
sponse of the structure. The top- and bottom- 
level iterations (outer and inner iteration 
loops) are performed by using Newton-Raphson 


and preconditioned conjugate gradient (PCG) 
techniques, respectively. 

The procedure has been implemented on the 
CRAY X-MP/4 computers at Cray Research, Inc., 
Mendota Heights, Minnesota. An assessment is made 
of the speedup resulting from the use of the proposed 
procedure. The wall-clock times, central processing 
times, and processing rates are presented for a typi- 
cal problem of the dynamic response of a laminated 
anisotropic cylindrical panel with an off-center circu- 
lar cutout. The loading is assumed to be uniformly 
distributed and normal to the panel surface and to 
have a step variation in time. The use of the forego- 
ing procedure and partitioning strategy reduces the 
total computational time by nearly an order of mag- 
nitude compared with that required by the direct 
analysis (on a single processor) for the first 10 steps. 

NASA Langley Research Center 
Hampton, VA 23665-5225 
November 14, 1988 
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Appendix A 


Governing Discrete Equations of the Structure 


In the present study, ihe temporal integration is performed by using an implicit, multistep, one-derivative 
scheme. For a constant time step size At, the linear multistep operators have the following form (see refs. 25 
and 26): 


or 


A tfc 


t: = At ^2 /?,_/ 

Z=0 1=0 






where refers to any of the fundamental unknowns; m is the number of steps; and a and (3 are coefficients 
associated with the various difference operators. The values of these coefficients for the three operators 
considered in the present study are given in table AI. 


Table AI. Coefficients in Multistep Operators 


Coefficient 

Galerkin’s 

method 

Gear 

two-step 

method 

Park 

three- step 
method 



1 

1 

1 

<*i-l 

-i 

- 4/3 

- 1.5 

2 

0 

1/3 

0.6 

»*- 3 

0 

0 

- 0.1 

Pi 

273 

2/3 

0.6 

Pi 1 

1/3 

0 

0 


For the application of the multistep integration schemes, it is convenient to transform the governing 
equations of the structure (eqs. (1) and (2)) into a system of simultaneous first-order ordinary differential 
equations. This is accomplished by differentiating the constitutive relations (eqs. (1)) with respect to time. 
The resulting equations can be written in the form: 

j t ([F]{H}-{G(X)}) = [S\{V} (A3) 

If equation (A2) is used, the governing differential equations (eqs. (2) and (A3)) can be transformed into a 
system of nonlinear simultaneous algebraic equations in {H} and {V } at time i. Based on an initial estimate for 
{A} and/or {V'}, an iterative Newton-Raphson scheme can be used for the solution of the nonlinear algebraic 
equations. The iterative process is represented by the following equations for the rth iteration cycle: 


[K]S ,) {^>l r) = M}i r) 

mi r+1) = Mi r) + 

where 

W= HM ^([Si+ra ] 

. s y mm a ii M 1 + [Sxf ] f . 


(A4) 

(A5) 


(A6) 
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r f H | (r) = y-(_ t F l r h | (r) 

l fv /, t=0 \ Symm ai_i[M] l ^ J i-l 


{At^N^X)}^^ 


(«■) 


+ A t/3, 


^L) 


W 


-PI 


(A7) 

(A8) 


If the stress-resultant parameters are eliminated, then equations (A4) can be replaced by the following 
equations in the velocity components: 


where 


[KvPi A^}J r) = {Q v }S r) 


\ K v]^ = [Kw] - [Khv] T I k hh\ 1 [ k hv] 

{Qv}^ = -{/wi r) + [KHvflKHHrHfH}^ 


[Kvv] = Oii[M] + 


(Atft ) 2 

« t 


'3iv p i( r ) 


[ K Hv] = At/% 



pG p ]( r )\ 

j 


[*////] = -«.•[*! 


(A9) 

(A10) 

(All) 

(A12) 

(A13) 

(A14) 


Note that since the stress resultants are allowed to be discontinuous at interelement boundaries, the elimination 
can be done on the element level (i.e., before assembly). 

The nodal displacements {X}^ r+1 ^ at the end of the rth iteration cycle, at time i, are computed by using 
the multistep formula: 


{X}\ r+1) =At^L{V}j r+1) 


Oti 


+ 


1 m / 

-Hi-*- 
“* 1=1 ' 


+ At/?, 




(A15) 


The iterative process is continued until convergence. An assessment of the solution accuracy can be made 
by using an energy balance check at the end of each time step and reducing the time step when the error 
exceeds a prescribed tolerance. (See ref. 23.) 


9 


Appendix B 


Symmetry Transformations 

For the application of symmetry transformations, the structure is first divided into two partitions (or 
substructures), m = 2. For convenience, each of the two substructures is assumed to have the same number 
of elements, the same number of nodal displacements, and the same number of nodal velocities. In addition, 
the interface between the two substructures is assumed to be along the displacement and velocity nodes, and 
no stress parameters are associated with the interface. As an aid to the visualization of the symmetric and 
antisymmetric vectors, a symmetrized finite element grid is introduced which has the same symmetry as that 
of the vectors. The symmetrized grid is constructed on a symmetric domain having the same planform (and/or 
surface) area as that of the original unsymmetric structure. (See fig. 1.) 

The vectors of stress parameters, nodal displacements, nodal velocities, and external loads { H }, { X }, {V }, 
and {Q} are partitioned into subvectors {H}i,{H} 2 ]{X}i,{X} 2 ,{X}s]{V}i,{V} 2 y {V}^; and {Q}i,{Q} 2 5 
{Q}$. The subvectors with subscript 1 are associated with one of the substructures, the subvectors with 
subscript 2 are associated with the other substructure, and the subvectors with subscript 3 are associated with 
the interface between the two substructures. 

The matrices [F], [5], and [ M ] and the vectors (G(X)}, (iV(if,X)}, and {Q} used in the discrete equations 
of the structure (eqs. (A6) and (A7)) are partitioned in submatrix blocks as follows: 

(Bl) 

(B2) 


(B3) 


(B4) 


(B5) 


(B6) 


where a dot in equations (Bl) to (B3) refers to a zero submatrix (or subvector). The symmetric and 
antisymmetric partitions of the vectors {/f}, {.X}, {V}, and {Q} are defined as follows: 


fjrl = [^11 
1 J [ • *22 

[ s ] = f Su , f 13 ' 

L * *=>22 *->23 J 

Mu * Mi 3 
[M] = M22 M23 

Symm M33 _ 

f N^HuXuXs) ) 

{N(H,X)}=\ N 2 (H 2 ,X 2 ,X 3 ) \ 

(NiiHuH^XuX^Xa) J 

f Qi 
{Q} = { Q2 
( Qz 


{ii-ansi 

{ v «} = [?v] { vj } 
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and 


{&H rgl 


Tq J 


{£} 


(BIO) 


where 


[Th] = 2 


m = 2 




m -w 


Cfr] = [TV] = [fg] = - 


[T’x] = [TV] = [Tg] = - 


[I] [I\ 


\I) -\I] 


(BH) 

(B12) 

(B13) 

(B14) 


and the matrix [/] is the identity submatrix. Again, a bar (“) and a tilde (~) over a vector denote the 
symmetric and the antisymmetric components of that vector. In equations (B7) to (BIO), the subscripts s and 
a refer to the symmetric and antisymmetric components, respectively. Note that {H} 2 , {X} 2 , {V}2i and {Q } 2 
account for the change in sign of the antisymmetric components associated with the symmetric response (e.g., 
transverse shear stress resultants, in-plane displacements, and rotation components in a shell structure, see 
ref. 27). For symmetric structures subjected to symmetric loading {H } 2 = {X}2 = {X}i, {V }2 = { V }\ , 

and {Q} 2 = {Q}i. 

The following relations follow from equations (B7) to (B14): 

] i } (B15) 


where 



[Hi ' 

l- 


l H 2 j 

f~ 

1 

f*il 





I 

1^3 J 



r vi 1 



\v 2 \ 





m = 

1 ■ 



[T] 

\Tx\ = [T V ] = 

[T] 


= m < 



(Bi6) 


(BIT) 


[T] 

-M 


[r. 


[Ta]J 


(B18) 


(B19) 


The vectors {X^}, {Vs s } and {X 3 a }, {V^ a } refer to the symmetric and antisymmetric components of {X}s 
and {V}& a dot refers to zero submatrix; [T s ] and [r o ] are diagonal submatrices with nonzero (unit) entries 
corresponding to the symmetric and antisymmetric components of each vector, such that 


[r .] + [r 0 ] = [/] 


(B20) 
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Equations (A4) are now embedded in a single-parameter family of equations of the form: 


... iM 


[K]i + Xo[K\i 


{A^}W = {0}l r) + Ao{Q}i r) 


where 


[*li r) - 


[kp = 


Symm a { [M] + [3^] ^ 

-a, [F] At fa ^[5] + J 


Symm a,[M] + ^^fe| 


{O}! 0 = 


{Q}\ r) = 

[F) = 

[F} = 


-ttr 

-mr 

Fn + F22 
Fn - F22 


Fn + F22 
Fn - F22 


|S|-[ 

Ms,,- 


S11 + S22 (5i3 + 523) [r a ] 


522 (5i3 - 523) [r«] 


5 n + S22 ( 5 i 3 - 523) [r a ] 
5 n - S22 (513 + 523) [r a ] 


[M] = 


Mu + M22 (M13 + M 2 3)(r a ] 

[r.] T {M33][r,] 


[ Symm 


Mu + M22 (M13 — M 2 3)[r 0 ] 

[ra] T [M 33 ][r«] 


m = 


Mil — M 22 (M 13 + M 23 )[ro] 

[T»] T (iMiaf - [M 2 3 ] t ) [r 4 ] M33[r«] 


Symm 


(/„}={ </»>! + </»>*} 


(B21) 

(B22) 

(B23) 

(B24) 

(B25) 

(B26) 

(B27) 

(B28) 

(B29) 

(B30) 

(B31) 

(B32) 
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(B33) 


{fH} ~ { {/«> i - Unh } 

( {/vh + {fv} 2 } 

{fy} = J [ r a] T {/y >3 i (B34) 


(fv} = 


{fv} 1 - {fv } 2 

[r 0 ] r {/v}3 


(B35) 


and Aq is a tracing parameter which identifies all the coupling terms to the symmetric/antisymmetric 

dG dG dN I dN 

components of the response vectors. The expressions for , and are similar to 

those of [3], [5], [Af], and [Af], respectively. 

The following comments can be made in connection with the symmetry transformations: 

1. When Aq = 1, equations (B21) are equivalent to equations (A4). The case Ao = 0 corresponds to 
the uncoupled equations associated with the symmetric/antisymmetric components of the response vectors 

{Atf}J r) and {AF},— 

2. The presence of nonzero terms in the matrices with a tilde (~) signifies the coupling between the symmetric 
and antisymmetric modes constituting the unsymmetric response. For a symmetric structure with identical 
substructures, the matrices [F], [5], and [ M ] are zero, and the equations associated with the symmetric and 
antisymmetric partitions of the response vectors are uncoupled. 

3. For Ao = 0, the equations associated with the symmetric and antisymmetric response quantities are 
identical except for the terms associated with the interface unknowns {£V}$ 8 and {A Therefore, only 
one set of these equations was considered, and the symmetry/antisymmetry conditions were applied at the 
interfaces. 

4. The symmetry transformations can be applied in a recursive maimer to effect further reduction in the 
size of the partitions (or substructures). To apply the second symmetry transformation (m = 4), each of 
the response vectors { H} s , { H} a , { V} 8 , { V} a , {X} 8y and {X} a is decomposed into symmetric/antisymmetric 
components (see fig. 2) and the associated matrices in equations (B21), [K] and [if], are now decomposed into 

[R], [. K ], [K] t and [K]. Also, the right-hand side vectors {Q} and {Q} are decomposed into {Q}, {Q}, {Q}, and 

{< 5 >. 
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Appendix C 

Application of Preconditioned Conjugate 
Gradient (PCG) Technique 

For the case m = 4, the solution of the linear alge- 
braic equations associated with each iteration cycle of 
the Newton- Raphson technique (eqs. (8)), using the 
preconditioned conjugate gradient technique, can be 
divided into four major steps which are outlined sub- 
sequently. For convenience, equations (8) are cast in 
the following form (using a single tracing parameter 
A): 


The vectors {R}q refer to the initial residual 
vector. Note that different symmetry /antisymmetry 
conditions are incorporated into the matrices [K] on 
the left-hand sides of equations (C6) to (C8). 

3. Initialize the symmetric/antisymmetric compo- 
nents of the conjugate search direction vectors: 

{2} 0 = {?} Q = 0 {2} 0 = {?}o (CIO) 

{Z}o = {Y } 0 {ho = {ho (Cll) 

Step 2— Line search and updating of solution and 
residual: 


[k]+X([K) + [K] + [K]) 


{AV>} 


= {0} + A({Q} + {$} + {£}) (Cl) 


The solution is sought for A = 1. The solution 
vector in each iteration can be decomposed into four 
symmetric/antisymmetric components as follows: 


For j = 0, step j by 1 until convergence and do 
the following: 


4. Compute the step length otj along the search 
direction: 



(C12) 


where 


{A^} = {A^} + {AV>} + {At/>} + {AV>} (C2) 

The four major steps of the solution are 
Step 1 — Initialization: 

1. Obtain an initial estimate {AV>}o of {Ax/>} by 
solving the equations: 

[R]{AiP}o = {0} (C3) 

Note that {AV>}o is doubly symmetric; therefore, 

{AV>} 0 = {AV>} 0 (C4) 

and 

{A^} 0 = {AV’jo = {A?}o = 0 (C5) 

2. Obtain the symmetric/antisymmetric compo- 
nents of the preconditioned residuals by solving 
the following three equations: 

mho = {Q} - m*ho = {£}o (C6> 

[£]{?}o = {Q} - [K}{ A^}o = {^}o (C7) 

mho = {§> - m^ho = {^>o (cs) 

and 

{ho = {ho = 0) (C9) 


a j = {hjm + {hjm + {y}J{r}j 

+ {hj{hj (ci3) 

bj = mj m, + mj {% + {hj m 3 

+ {hj& 3 (C14) 

m 3 = mm + mm + mhj 

+ mhj (ci5) 

{% = mhj + mh 3 + mhj 
+mhj (ci6) 

{hj = mhj + mhj + mh 3 

+mhj (ci?) 

{% = mhj + mhj + mh 3 

+mh 3 (cis) 

5. Update the symmetric/antisymmetric compo- 
nents of the stress-resultants and velocity incre- 
ments: 

{A^b+i = {A^>b + *j{hj (C19) 
{Ahj+l = {*hj+*j{hj (C20) 
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{A?} j - +1 = {A *}j + aj{Z}j (C21) 

{a5} j+ i = {A?},- + «,■{!},• (C22) 

6. Compute and update the symmetric/antisym- 
metric components of the new residual vector us- 
ing the equations: 

{%+! = -<*;{ !}; + {% (C23) 

{% +1 = -d J {% + {% (C24) 

{R) j+1 = -a j $} j + {R} j (C25) 

{R}j+i = + Wj ( C26 ) 

Step 3 — Convergence check: 

7. If 

l*lj+l < e\R\ 0 (C27) 

where |/?| is the Euclidean norm of the residual 
vector and £ is a prescribed tolerance, then stop, 
otherwise continue. 


Step 4 — Computation of new search direction vector: 

8. Solve for the symmetric/antisymmetric compo- 
nents of the preconditioned residual vector 

[£]{%+! = {%+! (C28) 

[&\{?}j+l = (%+l (C29) 

m{Y } j+ 1 = {R } j+ 1 (C30) 

mhj+l = {R}j + 1 (C31) 

9. Compute the orthogonalization coefficient fij+i 
using 

Pj + 1 = a j+l/ a j (C32) 

where the a’s are defined in equations (C13). 

10. Update the symmetric/antisymmetric compo- 
nents of the conjugate search direction vectors 

{2} j+1 = P j+1 {2} j + {?} j+1 (C33) 

{hj+1 = Pj+lihj + {hj + 1 (C34) 

{hj+l=Pj+l{Z)j + {Yh+l (035) 

ihj+i = %+iihj + {hj+i ( 036 ) 
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Appendix D 

Summary of Major Features of CRAY 
X-MP Computers 

The CRAY X-MP computer system is a multi- 
processor vector machine with shared central mem- 
ory. The two models used in the present study are 
the CRAY X-MP/48 and X-MP/416 computers (se- 
rial numbers 231 and 218) at Cray Research, Inc., 
Mendota Heights, Minnesota. Each one of the two 
models has four CPU’s and an ECL bipolar cen- 
tral memory. A summary of the hardware and soft- 
ware characteristics of the two computers is given in 
table II. 

Each of the CPU’s has the following features: 

8.5 nsec clock period 

14 functional units 

Scalar and vector instructions 

Four parallel memory ports connected to the 

central memory: two for vector and scalar 

fetches, one for result store, and one for 

independent input /output operations. 

The estimated peak performance of each CPU is 
210 MFLOPS. This cam only be achieved by using 
the chaining facility of the CRAY, that is, by over- 
lapping the vector multiply, add, and memory access 
operations. 

The central memory hais a capacity of 8 Mwords 
on the X-MP/48 model and 16 Mwords on the 
X-MP/416 model. The memory is organized into 32 
and 64 interleaved banks on the X-MP/48 and the 
X-MP/416 models. 

The two facilities for multiple CPU utilization 
on the CRAY X-MP system are macrotasking and 


microtasking. (See ref. 28.) Macrotasking is done 
on the FORTRAN subroutine level. User interaction 
with the macrotasking environment is done via calls 
to the macrotasking library. Microtasking is done via 
compiler directives. The granularity of microtasking 
is finer than that of macrotasking (i.e., down to the 
outer do-loop level). The overhead associated with 
microtasking is considerably less than that involved 
in macrotasking. 

A number of performance monitors are available 
on the CRAY X-MP system. The two used in the 
present study are FLOP TRACE on the X-MP/48 
and PERF TRACE on the X-MP/416. (See ref. 24.) 
Each of the two monitors produces a table showing 
the subroutine name, the number of times it was 
called, the execution time, the average time per 
call, the percentage of total program time spent in 
the subroutine, the accumulated percentage of total 
program execution times for subroutines up to the 
current one, the number of additions, multiplications, 
and reciprocals performed by the subroutine, the 
total floating point operations, the ratio of memory 
references to floating point operations, the memory 
reference rate, and the processing rate (in MFLOPS). 
The table is sorted in decreasing percentage, so that 
the most important routines are at the top of the list. 
The same statistics are displayed for the program as 
a whole. 

Although these utilities are very useful in identi- 
fying the time consuming subroutines and the candi- 
date routines for macrotasking, they are used with 
one CPU only. To estimate the speedup result- 
ing from multiprocessing, the only facility currently 
available is measuring the wall-clock time. 
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Table I. Flowchart of Computational Procedure 

Preprocessing Phase 

1. Input problem and model data 

2. Generate the linear modified elemental arrays (see appendix B) 

Solution Phase 

3. Begin time step loop 

4. Begin Newton- Raphson iteration loop 

5. Predict velocity components, stress resultants, and generalized displacements 

6. Generate nonlinear modified elemental arrays on the left-hand side of the equations (see appendix B) 

7. Eliminate stress resultants from elemental equations and assemble the left-hand side for the 
reduced-size model (see eqs. (A9) to (A 14)) 

8. Evaluate right-hand side elemental arrays and assemble them (see eqs. (A7) and (All)) 

9. Begin PCG iteration loop (see appendix C) 

10. Decompose the preconditioning matrices associated with different symmetry /antisymmetry 
conditions (eqs. (C3), (C6), (C7), and (C8)) 

11. Calculate the symmetric/antisymmetric components of the residual vector, the preconditioned 
residuals, step length, orthogonalization coefficient, conjugate search direction, and update the 
response vectors (tasks numbers 2 through 6 and 8 through 10, appendix C) 

12. Check convergence of PCG, if satisfied continue, otherwise go to step 11 

13. Check convergence of Newton- Raphson iterations, if satisfied continue, otherwise go to step 6 

14. Check if t = t max then stop, otherwise set t = t + At and go to step 4 
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Table II. Characteristics of CRAY X-MP/4 Computer Systems Used in Present Study 


Characteristics 

X-MP/48 

X-MP/416 

Number of CPU’s 

4 

4 

Clock cycle, nsec 

8.5 

8.5 

Memory 

8 Mwords® (32 banks) 
central memory 

16 Mwords® (64 banks) 
central memory 

Storage size 

32 Mwords SSD 6 

512 Mwords SSD 6 

Operating system 

COS 1.16 BF2 

UNICOS 3.0.8 

Compiler 

CFT version 1.15 BF2 

CFT version 1.15 BF2 

Performance monitor 
(on one CPU) 

FLOP TRACE 

PERF TRACE 


°1 Mword = 1048 576 words (each word is 64 bits). 

^The SSD (solid-state storage device) functions as a very high speed secondary memory and is 
treated by the operating system and the user as another disk drive. 


Table III. Maximum Absolute Values of Generalized Displacements 
and Velocity Components, {X} and { V }, at t = 3.0 msec 

Laminated anisotropic composite panel with an off-center 
circular cutout subjected to uniform normal loading 
p 0 = —50000 Pa (see figs. 3 and 5) 



Linear 

Nonlinear 

Generalized displacements: 

w 

7T 

0.2940 

1.007 

l°x^ 

0.1945 

0.3634 



0.7922 

1.654 

10 2 x 4>i 

0.9364 

1.825 

10 X 02 

0.1862 

0.5595 

Velocity components: 

o 

i 

X 

0.2554 

0.1775 

10- 2 X%- 

0.6049 

0.2036 

10" 3 x ^ 

0.2848 

0.1237 

10“2 X 01 

0.8394 

0.4421 

1O~ 3 x0 2 

0.1482 

0.1708 
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Table IV. Performance Evaluation of Proposed Strategy on CRAY X-MP/4 Computers 


Laminated anisotropic composite panel with off-center circular cutout subjected 
to uniform normal loading p 0 = —50000 Pa (see figs. 3 and 5) 



X-MP/48 

COS 1.16 BF2 operating 
system, CFT compiler 
version 1.15 BF2 

X-MP/416 

UNICOS 3.0.8 operating 
system, CFT compiler 
version 1.15 BF2 

Full 

structure 

Partitioned 

structure 

Full 

structure 

Partitioned 

structure 

Wall-clock time for first 

10 time steps, sec 

339 

123.1 (1 CPU) 
68.8 (2 CPU’s) 
41.7 (4 CPU’s) 

319.7 

122.6 (1 CPU) 
66.7 (2 CPU’s) 
36.3 (4 CPU’s) 

Sustained speed on 

one CPU, MFLOPS 

152.1 

119.1 

160 

119.6 

Speedup due to 

multiprocessing 


1.0 (1 CPU) 
1.79 (2 CPU’s) 
2.95 (4 CPU’s) 


1.0 (1 CPU) 
1.84 (2 CPU’s) 
3.38 (4 CPU’s) 

Total speedup, St, achieved 
by proposed procedure 

1.0 

2.75 (1 CPU) 
4.93 (2 CPU’s) 
8.13 (4 CPU’s) 

1.0 

2.61 (1 CPU) 
4.79 (2 CPU’s) 
8.81 (4 CPU’s) 
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Figure 1. Original finite element model and reduced-size models (substructures). 




Original model 
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Figure 2. Unknown displacements in original and reduced-size models (substructures). 





o o 

II II 
CM CM 

-©- •-©- 

II II 


•e- •-©- 

ii ii 

3 -3 
ii ii 

CM CM 
3 *3 

II II 
3 -3 


CD 

k. 

CD 

.C 

3 

>* 

u. 

CD 

> 

LU 





E 

II 


E 



in 

CM 

o 

CM 

in 


CD 

in 



o 

o 


in 


fc 

n 

CO 


CO 

E 

E 

C/> 

L- 


d 

n 

00 

CD 

h- 

3 

■5 

CD 

c 

E 

CM 

CO 

CM 

CO 

CD 

o 

CM 

J 

n 

o 

n 

CM 

II 

o 

o 

CD 

O 

CO 

o 

CM 

— i 

cr 

.C 

X 

X 

d 




23 


Figure 3. Laminated anisotropic composite panel with off-center circular cutout used in present study. 


Nonlinear 



Figure 4. Linear and nonlinear dynamic responses of laminated anisotropic composite panel with off-center 
circular cutout subjected to uniform normal loading p Q = —50000 Pa (see fig. 3). 
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uniform normal loading p 0 = —50 000 Pa (see fig. 3); times represent total wall-clock times for first 10 time 
steps. 




NASA Report Documentation Page 

National Aeronautics and 
Space Administration 


1. Report No. 2. Government Accession No. 

NASA TP-2850 


4. Title and Subtitle 

Partitioning Strategy for Efficient Nonlinear Finite Element 
Dynamic Analysis on Multiprocessor Computers 


3. Recipient’s Catalog No. 


5. Report Date 

January 1989 

6. Performing Organization Code 


7. Author(s) 

Ahmed K. Noor and Jeanne M. Peters 

9. Performing Organization Name and Address 

NASA Langley Research Center 
Hampton, VA 23665-5225 


8. Performing Organization Report No. 

L- 16476 

10. Work Unit No. 

505-63-01-11 

11. Contract or Grant No. 


12. Sponsoring Agency Name and Address 
National Aeronautics and Space Administration 
Washington, DC 20546-0001 


13. Type of Report and Period Covered 

Technical Paper 

14. Sponsoring Agency Code 


15. Supplementary Notes 

Ahmed K. Noor and Jeanne M. Peters: The George Washington University, Joint Institute for 
Advancement of Flight Sciences, Langley Research Center, Hampton, Virginia. 

The present research was supported by NASA Grant NAG 1-730 and Air Force Office of Scientific 
Research Grant AFQSR-88-0136, 

16. Abstract 

A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures 
on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning 
strategy in which the response of the unsymmetric structure at any time instant is approximated by 
a linear combination of symmetric and antisymmetric response vectors, each obtained by using only 
a fraction of the degrees of freedom of the original finite element model. The three key elements 
of the procedure which result in a high degree of concurrency throughout the solution process are 

(1) mixed (or primitive variable) formulation with independent shape functions for the different fields; 

(2) operator splitting or restructuring of the discrete equations at each time step to delineate the 
symmetric and antisymmetric vectors constituting the response; and (3) two-level iterative process 
for generating the response of the structure. An assessment is made of the effectiveness of the 
procedure on the CRAY X-MP/4 computers. 


17. Key Words (Suggested by Authors (s)) 
Nonlinear dynamic analysis 
Parallel processing 
Hierarchical partitioning 
Multiprocessor computers 
Symmetry transformations 
Operator splitting 
Mixed formulations 
Iterative techniques 


19. Security Classif.(of this report) 
Unclassified 


18. Distribution Statement 

Unclassified — Unlimited 


20. Security Classif.(of this page) 

Unclassified 


Subject Category 39 


21. No. of Pages 22. Price 

38 A03 


NASA FORM 1626 OCT 86 

For sale by the National Technical Information Service, Springfield, Virginia 22161-2171 


NASA-Langley, 1989 








